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Given an arbitrary long but finite sequence of observations from 
a finite set, we construct a simple process that approximates the 
sequence, in the sense that with high probability the empirical fre- 
quency, as well as the empirical one-step transitions along a realiza- 
tion from the approximating process, are close to that of the given 
sequence. 

We generalize the result to the case where the one-step transitions 
are required to be in given polyhedra. 

1. Introduction. In a seminal work, Baum and Petrie (1966) studied the 
following problem. Can one recover a homogenous hidden Markov chain 
from a finite sample (xq, xi, . . . , xat) from the chain. They prove that the 
maximum likelihood estimate converges to the correct value, as N goes to 
infinity. 

This problem has several applications, including ecology [Baum and Eagon 
(1967)], speech recognition [see, e.g., Rabiner (1989)] and identifying gene 
structure [see, e.g., Krogh, Mian and Haussler (1994)]. 

We study the following related problem. Can one find a "simple" process 
{sn) that "explains" a given observation (xq, . . . , xat)? More specifically, 
we are given a finite sequence {xq, . . . , xat) out of a finite set S, and we would 
like to find a simple 5-valued process {sn) that satisfies the following two 
properties: 

(i) under {sn)n<N, with high probability, the empirical frequency of s G 5 
is close to the frequency of stages m< N such that Xm = s, and 
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(ii) the conditional law of Sn+i, given (sq, • • • , Sn), is close to the empirical 
frequency of one-step transitions from Sn to Sn+i in {xq, . . . ,X]\r) (i-e., the 
frequency of stages m< N such that Xm+i = Sn+i out of the stages m< N 
such that Xm = Sn)- 

Were only the property (i) required, an i.i.d. sequence would do. The 
simplest processes that allow for serial correlation are homogeneous Markov 
chains. Therefore, a naive solution to this problem is to define (sn) to be 
the homogeneous Markov chain in which the transition from s to s' is the 
frequency of stages m < N such that Xm+i = s' out of the stages m < N 
such that Xm = s. It is true that asymptotically this Markov chain satisfies 
our requirements. However, we wish to have an approximation at time N, 
where is the number of observations, and the naive Markov chain may fail 
to do so. The concept of a simple process we use is therefore slightly more 
complicated: a simple process in our context is a piecewise homogeneous 
Markov chain with a bounded number of pieces. Our basic result states that, 
provided N is large enough, every sequence can be explained, in the above 
sense, by a piecewise homogeneous Markov chain with at most |5| pieces. 
Our proof is constructive, in the sense that we provide an algorithm that 
produces the desired piecewise homogeneous Markov chain. 

We also analyze a more general question. It is sometimes the case that the 
process we construct has to satisfy some exogenous constraints, for exam- 
ple, the one-step transitions must belong to some pre-defined polyhedra of 
probability measures. These polyhedra may reflect some a priori knowledge 
of the physics of the problem at hand. We then have to construct a process 
such that both (i) and (ii) are satisfied, and, in addition, the conditional law 
of Sn+i, given (sq, . . . , Sn), must belong to some polyhedron V{sn)- So that 
(ii) will hold, the empirical transitions along the observed sequence must be 
close to the given polyhedra. We prove that under proper conditions, and 
if is large enough, there exists a piecewise hidden Markov chain with a 
bounded number of pieces that satisfies these three requirements. 

A consequence of our result is the following. Let (z^) be any 5-valued 
process such that the conditional law of Zn+i, given (zq, . . . , z„), belongs to 
some given polyhedron V{zn), a.s. for each n< N. Assume, moreover, that 
there is an irreducible transition function b such that b{s, ■) E V{s) for every 
s. Then, provided A^ is large enough, for most realizations (xq, xi, . . . , xat) of 
{zn) one can find a piecewise homogeneous hidden Markov chain (s„) with 
at most |5| pieces such that both (i) and (ii) above hold, and the conditional 
law of s„_|_i, given (sq, . . . ,s„), belongs to V{sn), a.s. for every n < N. More 
precisely, the measure of the set of realizations that can be explained in 
the sense we just described goes to 1 as A^ goes to infinity. Thus, most 
realizations from (z„) can be explained by a simple process. In other words, 
assuming only that the sequence (xq, . . . ,xn) is generated by a process that 
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satisfies the given physical constraints, it is very likely that a simple process 
can be found, which with high probability has the same empirical behavior 
as the given sequence {xq, . . . ,xj\f). 

The paper is organized as follows. In Section 2 we define and investigate 
the problem with no polyhedral restriction, and in Section 3 we turn to the 
general problem. 

2. The basic problem. Given a finite set K, we let \K\ denote the number 
of elements in K, and V{K) denote the space of probability distributions 
over K. Throughout the paper we fix a finite set S of states. We use the 
symbol "C" to denote strict inclusion. For every subset C C.S, C = S\C \s 
the complement of C in S*. 

2.1. Presentation. The basic problem can be stated as follows. A se- 
quence X = {xq,xi, . . . ,xn) in S with finite length + 1 e N is given. An 
observer gets to see this sequence, or at least gets to know the number 
Ns,t = {n < N\{xn,Xn+i) = is,t)}\ of oue-step transitions from s to t, for 
each s,t £ S. The observer wishes to find a simple stochastic process (z„)„ 
over S, such that any typical realization of zq, . . . , z^v fits the data. We pro- 
ceed to give a formal meaning to this question before we state our basic 
result. 

For s G S, denote by 

N! = Y^Nl,= \{n<N:Xn = s}\ and ^^(s) = ^ 
tes 

the number of stages spent in s along x (excluding x^r), and the observed 
occupancy measure of s, respectively. The observed transition function p^ 
is defined by 

(1) p^(s, t) = ^ for each s,teS s.t. > 0. 

If Ng = 0, the definition of p^{s, •) G 'P(5') is irrelevant. 

The most natural notion of a simple process is that of a homogeneous 
Markov chain. As is argued in Remark 1 below, this notion is not flexible 
enough to allow for a good approximation in finite time. Thus, we introduce 
the notion of a piecewise homogeneous Markov chain. 

Definition 1. Let A' be a positive integer. A process z = {zn)o<n<N is 
a piecewise homogeneous Markov chain with K pieces if (i) z is a Markov 
chain and (ii) there exist integers = no < n-i < • • • < = N such that the 
random variables {zn),nk-i <n<nk, form a homogeneous Markov chain, 
for each k = 1, . . . ,K . 
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The law of an S-valued process z = (z„)„<7v is denoted by Pz. If z is 
a (possibly nonhomogeneous) Markov chain, we denote by p^, n < N, the 
conditional distribution of z„+i given z„. Also, is the empirical occupancy 
measure in the first m stages: 

^m{s) = — 1{0 <n<m-l:zn = s}\. 
m 

We are now in a position to state our basic theorem. 

Theorem 1. For every e > 0, every 5 G (0, 2(4|s | +i) ) '^'^^ every ( G 
(0, 25), there exists Nq G N such that the following holds. For every N > Nq 
and every S-valued sequence x = {xq, . . . ,X]\f), there is a piecewise homoge- 
neous Markov chain z over S, with at most \S\ pieces, such that: 

(Bl) Pz(|^ -l\>e)<^ for every s G S that satisfies > ^. 

(B2) Pz-a.s., one has ||p^(z„, •)- p^(z„, •) |[^ < e /or eac/in / n^.. 

Leaving aside the technical qualifications. Theorem 1 has the following 
implications. The number of pieces of the approximating process is inde- 
pendent of the length of the sequence x. In all stages, with the possible 
exception of at most IS*! of them, the transition function of z is very close 
to the observed transition function p^. Moreover, for any typical realization 
of the first components of z, the empirical occupancy measure i^ff is very 
close to the observed occupancy measure [restricted to states s whose 
observed occupancy measure is not negligible]. 

We stress that we consider realizations z of the same length as the se- 
quence. In that sense, our result is not an asymptotic result, but provides 
the basis for a good approximation in finite time, provided the sequence 
is long enough. Our proof is constructive, in the sense that we provide an 
algorithm that can be used to construct z. 

Observe that (Bl) and (B2) are not exactly of the same nature. Indeed, 
(Bl) relates to the samples from z, while (B2) is a structural property of z. 
From the proof it will be clear that endless variations are possible. 

Remark 1. The naive solution is to consider a Markov chain z with 
transition function p^. However, such a process may fail to yield a good ap- 
proximation in finite time. Indeed, let S = {a,b}, and consider the sequence 
X = {a,a, . . . ,a,b,b, . . . ,b,a) that contains N a's followed by 6's, and ends 
with an a. The transition function p^ is 

p^b, a)=p%a,b) = l- p^(6, 6) = 1 - p^(a, a) = 1/N. 

Given a Markov chain with transition function p^ and initial state a, the 
probability that z„ = a, for every n < 2 A + 1, is bounded away from zero. 
In particular, condition (Bl) will not hold. More generally, no homogeneous 
Markov chain satisfies both (Bl) and (B2) in this example. 
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This example highlights the heart of the problem. The naive solution does 
satisfy (Bl) and (B2) whenp^ is sufficiently mixing. However, when it is not, 
there is no Markov chain that approximates the given sequence in the sense 
of (Bl) and (B2). 

The proof of Theorem 1 is organized as follows. First, we provide a general 
structure result in Section 2.2. When C is a subset of S, and x = (xi, . . . , xat) 
is a sequence of elements in S, a C-run is a subsequence (x„j , Xm+i, ■ ■ ■ , Xn^) 
such that all its elements arc in C, while Xn^—i and x„2+i are not in C (if 
ni = 1 or 77,2 = the last condition is vacuous). Our structure result states 
that given any finite sequence x of elements of S, there is a partition of S 
with the property that for every atom C of the partition and every proper 
subset D of C, the number of C-runs is much smaller than the number of D- 
runs. Thus, the sequence moves around inside any atom much more quickly 
than from one atom to another. 

We will use the structure result to argue that the observed transition 
function associated to x, when restricted to any atom of the partition, 
is mixing. We then construct the simple process z that approximates x. 
This process will have the following features: (i) it visits every atom C of 
the partition only once, (ii) the duration of the visit to C is X^seC-^f > the 
observed number of stages spent in C, and (iii) the transition function of 
z during the visit to C is p^, properly modified so as to prevent the chain 
from exiting C too early. 

Section 2.3 contains several results on Markov chains. The proof of The- 
orem 1 is given in Section 2.4. 

2.2. A structure theorem. We here collect some general notation that is 
in use throughout the paper. We use the letters p and q, with possible sub- 
or superscripts, to denote transition functions. Probability measures over 
S are denoted by fi, empirical occupancy measures over S are denoted by 
u, while probability measures over are denoted by P. Finally, random 
variables are often boldfaced, while generic variables are not. 

Let a finite sequence x = {xq, . . . , xat) in be given. For every two subsets 
A,BCS, we set 

N1,B= E Kt and N% = NXs = Y.Ns- 

seA,teB seA 

These are the number of transitions from A to B along x, and the number 
of visits to the set A along x, respectively. For C C S", we define 

This is the number of C-runs along x [see Feller (1968), II. 5]. Plainly, < 
+ i?^ for every D C C, and - Ry\ < 1. Note also that Rf^ = N^- + 

We now state our structure result. 
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Theorem 2. Let a > and a finite sequence x of elements of S be given. 
There is a partition C of S such that, for every C ^C: 

(PI) i?^<(a + l)l^l. 

(P2) For each DdC, Kj^> aR^. 

Proof. Since R^ = 1, the trivial partition C = {S} satisfies (PI). Among 
all the partitions that satisfy (PI), let C be one with maximal number of 
atoms, and set k = \C\, the number of atoms in C. We will prove that C 
satisfies (P2). If it does not, there are C G C, and a proper subset D of C, 
such that Rf) < oRq. 

Consider now the partition C \ {C} U {D,C \ D} obtained by further 
partitioning the set C into D and C \D. We show that this new partition, 
with k + 1 elements, satisfies (PI) as well, contradicting the maximality of 

C. Indeed, < aR^ < (a + l)^'+\ and R^^j^ < Rc + Rf) < + 1) ^ 
(a + l)'=+^ □ 

As Theorem 2 has its own merit, we provide two comments concerning 
the partition C that satisfies (PI) and (P2). 

Comment. There need not be a unique partition that satisfies both 
(PI) and (P2). Indeed, let S = {0, 1} and x = (0, 1, 0, 1, . . . , 0, 1) (a sequence 
of length N + 1), and let a > be such that a < < (a + 1)^. Since 
= R^^y = the two partitions of S satisfy (PI) and (P2). 

Comment. For a > 2, the partition that is defined in the proof of The- 
orem 2 is unique. To verify this claim, it is enough to check that, given two 
partitions C and T> that satisfy (P2), the following holds: for every C € C 
and D (^D, if the intersection C OD is nonempty, then it is equal to either 
C or D. 

Assume to the contrary that P = C H is a proper subset of both C and 

D. Then one has iV- -^ + Nf,^^p = N-p^ + N^^^p = R%- l^^^p, N^^^^p + 

N-pp^^p <R-p- l,^;p, Nl- < = R-c - l.^ec, and < N^- = 

Rf) — Ixjvgd- It follows that 

Rp - ^XM&P ^ ^P,C\P + ^P,D\P 

= 2R'p-2x l.^eP - N'p;c " ^p,d 
^ o E?^ r?^ 

In particular, by (P2), 

R^ + R%- l^^gp >Rf>>ax max{i?^, Rf^}, 
a contradiction when o > 2. 
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2.3. On Markov chains. We here collect a few useful results about Markov 
chains. First, we provide a result on the speed of convergence of an irre- 
ducible Markov chain to its invariant measure. Next, we make a few obser- 
vations on the expected exit time from sub-domains of S. 

Throughout the present section, a transition function p over S is given. 
For s € 5, we denote by Fs,p the law of a homogeneous Markov chain z 
with transition function p and initial state s, and by ^s,p the expectation 
w.r.t. Ps,p. For /_f G 'P{S), E^^p = X]se5 is the expectation operator 
when the initial state is chosen according to /U. 

The hitting time of a set C C S" is Tc = min{n > : z„ € C} (with min = 
+CX3). For t £ S, we abbreviate T^ty to Tj and we denote by = min{n > 
1 : z„ =t} the first return time to t. 

2.3.1. Convergence to the invariant measure. 

Definition 2. Let 7 > be given. The transition function p is ^-mixing 
if 'Eis^p[r^] < 7, for every s,t £ S. 

Plainly, a 7-mixing transition function is irreducible. The next theorem 
bounds the speed of convergence of the empirical occupation measure to 
the invariant measure for 7-mixing homogeneous Markov chains. In this 
statement is the occupancy measure in stages 1 through m:v^{s) = 



Theorem 3. Assume that the transition function p is ^-mixing and let 
fj, be its invariant measure. Let ?i G N and e S (0, 1/2) he such that en > 47. 
Then, for every s,t £ S, 



Remark 2. Inspection of the proof shows that inequality (2) holds more 
generally for each state s € S such that maxtes Ef^p[T+] < 7. 

Remark 3. Since 11^^(5) — I'ni^)] ^ ^1 one has, under the assumptions 
of Theorem 3, 



Remark 4. It is likely that the bound in Theorem 3 can be substantially 
improved, possibly to an exponential bound. Recently, Glynn and Ormoneit 
(2002) provided a generalization of Hoeffding's inequality to uniformly er- 
godic chains. However, their ergodicity assumption (Al) is stronger than our 
mixing assumption, hence our result does not follow from their statement. 



— 1{1 < n < m:z„ = s}\. 




(3) 




8 D. ROSENBERG, E. SOLAN AND N. VIEILLE 

Proof of Theorem 3. The proof relies on the fohowing two identities: 

(4) E,,p[r+] = -ly and /i(s)Var,,p(r/) = 2E^,p[r,] + 1 - -iy 

[see Aldous and Fih (2002), Chapter 2, identity (22) for the second one]. 
Since p is 7-mixing, l/fi{s) = Bs,p[T+] < 7 < en/4, and E^,p[r,] < E,,,p[r+] < 
7. Since 1 — 1/ ^l[s) < 0, we also have iJ,{s)YaVs,p{T^) < 27. 

For notational clarity, we set = \nfi{s){l — e)~\ and = [nfi{s){l + e)\ . 
Note that ne<rf ■ Moreover, ng + n^ — 1 < 2n/i(s), so that Ue + n'^ < 3nfi{s). 

On {^'.^(s) < fJ'{s){l — e)} one has Vs^m, > n-, whereas on {i^^(s) > /u(s)(l + 
e)} one has Vs^n^ < n. Therefore, the event {|^^^^ — 1| > e} is included in 
the union of the two events {Vs^n^ ^ n} and {Vs^n^ ^n}, so that 



>e]< Pt,p(F,,„, > n) + Pt,p(y,,n. < n). 



,(.) 

We will prove the result by providing an upper bound on the probability 
that V^,n£ > n- and on the probability that Vg^n^ < n. 

Since e < 1/2 and since en^{s) > 4, straightforward manipulations show 
that 

(6) minin — — — n\ > -ne and 

I ^^{s) f^{s) J - 4 

f , o, rie — 1 — 1 11 

(7) mm< nil — e ) — rs — \-l — n> > -ne. 

For s € S" and A; € N, let V^^fc denote the time of the A:th return to s (with 

' s,k 

S\{s}. 

We distinguish the two cases s = t and s ^t. 



Vsfl = 0), and let T^^^ = Vg^k — Vs,k-i denote the length of the fcth visit to 



Case 1. s = t. 

In this case the random variables T^^, are i.i.d. and share the law of . 
Since Es^p[T3"^] = l//x(s), one has, by Chebyshev's inequality and by (6), 

P..(T4,n. > n) = P,, ( - ^ > n - ^) < 



fi{s) /^(s) / {3/Ane) 



\2 



and 



Ps,p[Vs,n^ <n)< Ps,p — -- - Vs,n^ > ——- - n < 



^Var,,p(T+) 



^H{s) ' - n{s) J - {3/Ane) 
Hence, by (5), (4) and since p is 7-mixing, one obtains 



(8) Ps,p 



^ , j ^ (ne + n^)Var,,p(r+) ^ 16 x 3 x 27 ^ I77 
~ ~ (3/4ne)2 ~ dne"^ ne^ 
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Case 2. s/t. 

By Markov's inequality and since p is 7-mixing, 



(9) 



Pt,p(T+>e^n)<^. 



By repeating the steps of Case 1 using (7), one has 
<P,,p(r+ + ...+T+„^>n(l-e2)) 



(10) 

while 
(11) 



^ (n, - l)Var,,p(r+) ^ (n, - l)Var,,p(r+) 



(n(l-e2)-(n,-l)M.))^ 



(l/2ne) 



Pt,p(V;,n^ < < P.,p(r+ + . . . + r+ < n - 1) 



< 



(l/2ne)2 



By summing (9)-(ll), one obtains 



>e < 



4Var,,p(r/)(ne + - 2) + 771 



Therefore, 



/i(s) 



4 X 2 X 27 + 7 177 



as desired. □ 



2.3.2. Expected exit times. We here analyze the exit time from a given 
sub-domain. Our estimates use two new mixing measures for irreducible 
transition functions. 

Throughout this section we assume that the transition function p is irre- 
ducible with invariant measure fi. We use repeatedly the inequality 

(12) ^s,p[Tl\ < ^sATluH}] + ^tATTh 

which holds for every L C S and every s,t L. 

Definition 3. For C C S, we define 

Ap(C) = maxEs,p[T^] and pp{C) = maxminFisAT-^]. 
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Observe that one always has Ap(C) > Pp{C). \p{C) bounds the thne it 
takes to leave C. Pp{C) may be interpreted as a measure of how fast a 
Markov chain with transition function p visits each and every state of C. 
The following lemma adds substance to these interpretations. We shall use 
it to derive further estimates of exit times. 



Lemma 1. For every C Q S: 

(i) Esp[r-^] < \D\pp{C), for every D CC and every s £ D. 

(ii) Bs,p[Tc] > Ap(C) - (|C| - l)pp{C), for every s e C. 

Proof. We prove the first statement by induction over \D\. Plainly, 
the inequality holds for singletons. Assume it holds for every subset with 
k elements. Let D <Z C he any subset with fc + 1 elements, and let s € 
D. Choose t G D such that Eit^p[Tjy\ < Pp{C). By (12) and the induction 
hypothesis, applied to \ {t}, we have 

^sAT-d\ < Es,p[%u{t}] < (1^1 - l)Pp{C)+Pp{C). 

We now prove the second statement. Let s € C be given, and let t S C be 
such that Ap(C) = E(^p[T^]. If i = s, (ii) trivially holds. Otherwise, by (12) 
and (i), 

E,,p[%] > Et,p[%] - ^tATcuis}] > Ap(C) - {\C\ - l)pp(C), 
as desired. □ 

The following lemma bounds the probability that the process leaves a set 
C before it visits some given state t (zC. 

Lemma 2. For every C C S and every s,t gC, one has 

Pp{C) 



(13) p,,p(%<ro<2|c| 



Ap(C)-(|C|-l)pp(C)' 



Proof. If s = t, the left-hand side in (13) vanishes, so that the lemma 
trivially holds. Hence we assume from now on that s ^ t, so that \C\ > 2, 
and, therefore, Pp{C) > 1. 

We modify the state space S and the transition function p by collapsing 
C to a single state, still denoted C, which leads to t in one step. Since 
this change does not affect the probability that <Tt, we still denote the 
modified transition function by p. This amounts to assuming that p(C, t) = 1, 
hence E^p[rt] = 1. By Aldous and FiU [(2002), Chapter 2, Corollary 10], 



(14) Ps,p(%<T< 



_ Es^p[Tt] + Et^p[r^] — Es,p[T^] 
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Since Er^^^lTt] = 1, one has, by (12), Es,p[Tt] < Es,p[r^LJ{t}] + ^- Equation 
(12) also implies that Ef^p[T^] — Es^p[T^] < Et_p[r^u|^|]. Therefore, by Lemma 
l(i), the numerator in (14) is at most 

1 + Ei,p[r^^^^j] + E,,p[r^^^,j] < 2(|C| - l)pp(C) + 1 < 2\C\pp{C). 

On the other hand, the denominator is equal to 1 + Ej^p[T^], hence by 
Lemma l(ii) is at least Xp{C) — (|C| — l)pp(C). □ 

For C C S, we denote by pc the transition function p watched on C [see 
Aldous and Fill (2002), Chapter 2, Section 7.1]. Formally, 

(15) pcis,t)=p{s,t)+^p{s,u)Pu,piTc=Tt) for every s,t € C. 

Since p is irreducible, the transition function pc is irreducible, and its in- 
variant measure coincides with the invariant measure /i of p, conditioned on 
C:fi{s\C) = ii{s)/^i{C), for every s G C [see Aldous and Fill (2002)]. 

The next lemma bounds the time it takes the process to reach a given 
state t (zC Q S, when watched on C. Thus, it bounds the expected number 
of stages the Markov chain with transition function p spends in C until it 
reaches t for the first time. 

Lemma 3. Fors,teC, one has E.p^fTt] < (|c|-i)pp(C0 

Proof. If s = t the lemma trivially holds, as in this case Es^pp[T(] = 0. 

Assume then that s ^t, so, in particular, jCj > 2. Let t G S" be given. For 
convenience set a = max^gc" ^s,pc i'^t] , and let s' (z S achieve the maximum. 
Since jCj > 2, s' ^ t. Therefore, by Lemma l(i), 

a = E,,,pp[rt] < E,,,p[T^^^^j] + Ps'ATc < Tt)a 
<{\C\-l)pp{C) + aPs'ATc<Tt)- 
Thus, for every s G C, 

(|C|-l)pp(C7) ^ {\C\-l)pp{C) 



Es,pc [Tt] <a< /S"' L. < 



1 - Ps',p{^c <Tt) 1 - max^gc- Pn,p(% < Tt) ' 
as desired. □ 

We conclude with two results, stated without proof. First, given C C S, 
define 

(16) KpiC)- 
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The numerator in (16) is the frequency of stages spent in C, while the de- 
nominator is the frequency of exits from C. Therefore, Kp[C) is the average 
length of a visit to C. In particular, the following holds: 

(17) minE,,p[%] < Kp{C) < maxE,,p[%] = Ap(C). 

Second, straightforward computations show that for every C C S, 

^^g^ KC) ^ min,gcE,,p[%] 

H{C) ~ max^g^E^,p[rc]' 

2.4. Proof of Theorem 1 . This section is devoted to the proof of Theorem 
1. It is convenient to deal with sequences x that are exhaustive {N'^ > for 
each s € S) and periodic {xn = xq). The general result will follow since 
an arbitrary sequence x can be extended into an exhaustive and periodic 
one, by appending at most |5| elements to x (see details in Section 2.4.3). 
Observe that for the purpose of Theorem 1, all sequences can be assumed 
to be exhaustive, since states that are not visited along x can simply be 
dropped. Since this assumption cannot be made to prove the more general 
theorem of this paper, we prefer not to make it here as well. 

The assumption that the sequence is exhaustive and periodic allows us to 
make use of the following lemma whose proof is omitted. 

Lemma 4. Let x = {xq, . . . , xjv) be exhaustive and periodic. The observed 
transition function p^ is irreducible, and its invariant measure coincides with 
the observed occupancy measure u^{s) = 

Let £ € (0, i), 5 G (0, and C G (0, 25) be given. We choose A^o S N 

large enough so that (Nl) JV|}-{4|-5|+1)'5 > 2^s\/^^ ^^2) N^^''^ > 4 x 
17|S'|^/e2. Therefore, we have, in particular, (N3) (iYo^^ + l^l-^liVo^-^ < e/(2l-5| + 
1), (N4) > max{|S|Ve, 2|5| + 2}, (N5) N^^ > 8/e and (N6) N^^ > max{l/e, 10\S\}. 

We will prove that the conclusion of Theorem 1 holds for every N > Nq 
and every exhaustive, periodic sequence x. We first apply Theorem 2 to 
the sequence x, with a = N^^ , to obtain a partition C = {Si, . . . , Sk) of S 
that satisfies the conclusions of that theorem. Observe that a depends on 
the length of the sequence. We now proceed as follows. In Section 2.4.1 we 
argue that the transition function p^ is mixing, when watched on any atom 
Sk of C. In Section 2.4.2 we define the approximating process, and we check 
that assertions (Bl) and (B2) hold. 

2.4.1. Properties ofp^^. Following the notation in use in Section 2.3.2, 
we denote by pg^ the transition function p^, when watched on Sk- Since 
p^ is irreducible, so is pg^. The goal of this section is to prove that pg^ 



APPROXIMATING A SEQUENCE 13 

is A^^~^'^-mixing (see Proposition 1 below). To this end, we first relate the 
mixing constants Xpx{Sk) and Ppx{Sk) to the features of x. 



Lemma 5. Let k be such that \Sk \ > 1- One has 
(19) Pp-{Sk) < max . ^ < -Xp^{Sk 



Note that the quantity nx ° i is approximately the average length of a visit 
to D along x. Thus, the expected exit time from D C Sk is much smaller 
than the expected exit time from Sk- 

Proof of Lemma 5. Let k be such that \Sk\ > 1, and let D C Sk 
(here D may be equal to 5^). By Lemma 4 one has ^^{D) = Nf^/N and 
EssD ^''{s)p^{s,D) = NI-/N. By (16), as long as I) C 5, Kpx{D) = Nh/N^-, 
so that 

J^X ATX 

(20) -^<Kp.{D)< ^ 



nx — f ' — f?x _ 

If Z) is a strict subset of Sk, (17), the second inequality in (20), (P2) and (N4) 
yield 

(21) minE, „JT-r:] < K„4D) < ^< ^ — < - x 

The left-hand side inequality in (19) follows by taking the maximum over 
DcSk. 

If Sk = S, Xpx(Sk) = +00, and the right-hand side inequality in (19) triv- 
ially holds. Otherwise, when applied to Sk, the first inequality in (20) and 
(17) yield 

(22) -^<Kp.iSk)<Xp4Sk), 

and the right-hand side inequality in (19) follows from (21) and (22). □ 
Proposition 1. If\Sk\ > 2, the transition function pg^ is N^~^^ -mixing. 

Proof. We wiU prove that E^^p^ [Tt] < N^'^^ - 1, for each s,t G Sk- 
Let s,t G Sk be given. By Lemma 3, 

E X [Tt] < (I'S'fci - ^)Pp-{Sk) 
""'PSfe - 1 _ max„6s^ Pu,p^ (T-^^ < Tt) ' 
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By Lemma 2, the denominator is at least 1 — 2|5'fc| ^ ^(^Sk)-{ \ sf \ -i)p ^^(gfc) — ^' 
where the inequality holds by Lemma 5 and (N6). Therefore, 

(23) E,,p.jrt]<2|5fc|pp.(5fc). 
By Lemma 5, (P2) and (N4), 

(24) p„^(5'fc)<max — < —r, < rrrn — • 

The result follows by combining (23) and (24). □ 

2.4.2. The approximating process. We now construct a Markov chain z 
that approximates the sequence x. Ideally the chain is composed of \C\ pieces, 
with the length of piece k being N^^. However, to avoid degenerate cases, 
we take into account only the atoms that are frequently visited, that is, 
those with N§^ > N^'^ . 

Set Kq = {k : Ng^ > N^^^}, and assume for convenience that Kq contains 
the first li^ol atoms in C, so that Kq = {1, . . . , |i^o|}- Assume, moreover, 
that the set S is the most frequently visited set, so that, in particular. 

The chain z is a piecewise homogeneous Markov chain with \Kq\ pieces. 
The "extra" stages that are created by the removal of rarely visited atoms 
in C are added to piece l-ftTol- Since piece \Ko\ is the most frequently visited 
piece, this will hardly affect the estimates for that piece. 

Formally, we denote by the length of piece k. Thus, nik = ii k < 

I J^o I , and mk = N - J2j<\Ko\ "^^j if ^ = l-K'o I • In particular, 1 < < 1 + ^ . 
For /c = 1, . . . , Kq, we let pk be a transition function such that 



k 



Pk{s,t) =ps^{s,t), s€Sk,teSk, 
Pk{s,Sk) = l, s^Sk- 

The exact definition of pfc(s, •) for s ^ Sk is irrelevant. Thus, pk moves in one 
step to Sk and coincides with p^^ there. 

We let z be a Markov chain with initial state in ^i, and transitions 
p^{s,t) = pk{s,t) if — 1 + I]j<fc "T-i < < ~1 + Z]j<fc"^i- (This does not de- 
fine p%_i- The choice of p^_x is irrelevant. On the other hand, it defines 
p'^i, which is never used.) Note that z is a piecewise homogeneous Markov 
chain with \Kq\ <\S\ pieces. 

Plainly, the chain z visits each set Sk, k < Kq, only once, for exactly 
stages: from stage J2j<k''^j to stage J2j<k''^j ~ 1 (inclusive). 

We now prove that both assertions in Theorem 1 hold. 

We start with assertion (Bl). Let k G K and s £ Sk he given. We discuss 
three cases. 
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If < N^~\ then i/^(s) = N^/N < 1/N\ and (Bl) trivially holds. 

Assume now that iV| > iV^-^ and that Sk = {s} is a singleton. By con- 
struction, the chain z will be in state s all through piece k, and only in those 
stages. In particular, by (N4), 



< 



and (Bl) holds. 



1 < 



Assume finally that Ng > N"^ ^ and \Sk\ > 2. By (N5) and Proposition 



nik-N^ \Sl 



1, the assumptions of Theorem 3 hold w.r.t. p = Pg ■j = N 



1-35 



n ■■ 



N 



1-5 



and e/2. For every t ^ Sk, one has, by (N5), Remark 3 and (N2), 



>e 



(25) 



Pk 



<4 X ^ <4 



£ 1 

> - H 

2 mfczy^(s|5fc) 



1 

w 



Since the chain z does not visit s & Sk, except in piece k, (Bl) follows from 
(25). 

We now turn to assertion (B2). Since z never visits states in [Jki^Ko'^k, 
we need to verify that (B2) holds for states s € Ufcei^o '^k- Observe that for 
every k S Kq and every s G Sk, one has 



\\Pk{s,-) 



u^Sk 



If Sk = {s} is a singleton, then by Theorem 2(P1) and (N3), the right- 
hand side is bounded by jyt-I — < If l-S'fcl > 2, then since > Rg, by 

S S 

Theorem 2(P2) and (N6), the right-hand side is bounded by < ^^4^^^, 



< 



£. 



Since for every stage n in piece k, except the last one, = pk and z„ € Sk, 
Pz-a.s., one has for such n's, 



l|Pn(^n) ■ 



-P''(zn,-)|| = \\Pk{Zr, 



P^(Zn,' 



<e 



almost surely. 



2.4.3. The case of arbitrary sequences. Our goal now is to prove Theorem 
1 for any sequence of observations. We add a few stages to the sequence 
X — fictitious observations — in order to obtain a periodic and exhaustive 
sequence x*. We then apply the above analysis to the augmented sequence 
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X* . Since only few fictitious observations are needed to change x into a 
periodic and exhaustive sequence, the desired result will follow. 

We let e G (0, 1/2), (5 G (0, 2(a[E\+T)) C e (0, 2S) be given. Set 6' = 26e 

(O'iM+i) ^^'^ e' = e-2£^. 

Choose A^o e N such that (Nl) and (N2) hold for A^o w.r.t. b' , e' and C 
(rather than w.r.t. 5, e and C,). We will argue that the conclusion of Theorem 
1 holds for every N > Nq. 

Let X = {xq, . . . ,X]\[) be an arbitrary sequence in S, and let S* = {Jn=o{xn} ^ 
S be the set of states visited by x. Consider the sequence x* = (xq, xi, . . . , x^r, xj, . . . , x*,xo), 
where r = \S\ — \S*\ is the number of states not visited by x, and S \S* = 
{x^, . . . ,x*}. By construction, this new sequence is periodic and exhaustive. 
The length + 1 of this sequence is iV + r + 2 < + IS] + 2. 

By construction A^^, satisfies (Nl) and (N2) with 5' and e' . Therefore, 
there is a piecewise Markov chain {'z-n)n<N, such that (Bl) and (B2) hold 
w.r.t. A^*, , e' , 5' and Observe that each state x*^S\S* constitutes 
a singleton in the partition C associated with x*, and that A'^* = 1, so that 

3 

it is never visited by z. 

One can verify that the restriction of z to the first A^ stages satisfies 
(Bl) and (B2) w.r.t. A^, , £, 5 and (. The computations are tedious and 
of no specific interest, and are therefore omitted. 

3. The general problem. 

3.1. Presentation and discussion. We here address the more general prob- 
lem of devising an approximating simple process, given structural constraints 
on the process. In other words, we wish to construct a simple process within 
a given class of processes. The kind of structural constraints we allow for 
is described as follows. For each s € /S, we let a nonempty polyhedron 
V{s) C V{S) be given. Recall that a polyhedron is the convex hull of finitely 
many points. Let V = {V{s))ses denote the product polyhedron, and for 
every s G 5 denote by V*{s) the set of extreme points of V{s). 

Definition 4. A V -process is an S-valued process z = (z„)„ such that 
for Gvery 71 ^ the conditioiicil distribution of z^^, given zq, . . . jZ^^™]^, is in 
y(z„_i), Pz-a.s. 

In a sense, one-step transitions are required to satisfy exogeneously given 
constraints described by the polyhedra V{s),s € S. 

We will weaken the simplicity requirement and introduce the notion of 

piecewise homogeneous hidden Markov chain. 



APPROXIMATING A SEQUENCE 



17 



Definition 5. A process z = (z„) over S" is a (piecewise homogeneous) 
hidden Markov chain if there are a finite set S' and a (piecewise homoge- 
neous) Markov chain w = (w-n ) over S X S' such that z is the projection of 
w over S. 

Thus, a hidden Markov chain is the projection of a Markov chain with 
values in a product space. Correspondingly, a piecewise homogeneous hidden 
Markov chain is the projection of a piecewise homogeneous Markov chain. 

We are now in position to describe the problem considered in this section. 
Given a sequence x = (xq, . . . ,xi\f) in S with finite length N + I, does there 
exist a stochastic process z that (i) is both a F-process and a piecewise 
homogeneous hidden Markov chain, and (ii) approximates x in the sense 
that both assertions (Bl) and (B2) in Theorem 1 hold? 

Without further qualifications, the answer is negative. Indeed, if all V- 
processes are transient, assertion (Bl) cannot hold. On the other hand, if the 
sequence x is not typical, in the sense that the observed transition function 
is far from V [i.e., p^{s, ■) is far from V{s) in the Euclidean norm for some 
s € 5], assertion (B2) cannot hold. The following two examples illustrate 
these points. In both examples V{s) is a singleton for each s G 5, hence 
there is a unique ^/-process which is a Markov chain. 

Example (A nonirreducible Markov chain). Let S = {a,b,c}. Define V 
so that both states b and c are absorbing, while state a leads with equal 
probability to states b and c. Starting from state a, one of the two sequences 
(a, b,b,b, . . . ,b) and (a, c,c,c, . . . ,c) results. But if the given sequence is, for 
example, x = {a,b,b,b, . . . ,b), the unique ^-process does not satisfy (Bl) 
when starting from state a. 

We shall therefore restrict our study to sets V such that there exists an 
irreducible homogeneous T^-Markov chain. 

Example (A nontypical sequence). Let S = {a, b} and define V so that 
both states lead with equal probability to a and b. If the given sequence is 
X = (a, a, . . . , a) the unique F-process does not satisfy (B2). 

We shall therefore limit ourselves to sequences that are typical w.r.t. V, 
in the following sense: 

Definition 6. Let G N and 6,e > Ohe given. A sequence x = {xq, . . . ,xn) 
is (N, 5, e) -typical if there exists v = {v{s, ■))s € V such that |1 — \ < £ 
for every s,teS such that either N^p'^{s,t) > or N^v{s,t) > . 
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In Definition 6, is the observed number of stages spent in s along x, 
and is the observed transition function; see Section 2.1. 

In words, a sequence x is typical if there is a transition function v such 
that (i) v{s) € V{s) for every s: v is an admissible transition, and (ii) v{s,t) 
is close to p^{s,t) whenever the transition from s to t occurs frequently, 
either along x or under v. The latter statement is not completely accurate, 
as we do not use the invariant distribution of v, but rather the observed 
occupancy measure. 

The set of (A^, 5, e)-typical sequences is denoted by T{N,6,e). Note that 
the notion of a typical sequence is relative to the family V of polyhedra. 

Our first theorem states that if V contains an irreducible transition func- 
tion, and if x is typical w.r.t. V, then one can approximate x in a proper 
sense by a piecewise homogeneous hidden Markov chain. 

Theorem 4. Assume that V contains an irreducible transition func- 
tion b, and set B = maXs^tg5Es^b[Tt]. Let i^,r] ^ (0, 1) be given. There exist 
6,e > and A^i G N such that the following holds. For every N > Ni and ev- 
ery sequence x S T{N,6,e) , there exists a V -piecewise homogeneous hidden 
Markov chain z, with at most \S\ pieces, such that the following hold: 

(Gl) Pz(|^ -l\>v)<w * ^ ^ ^^^^ ^"(«) ^ W- 

(G2) Let No = j{n < iV : ||p^^(z„, •) -p-(z„, ■)\\^ > r?}| . Then E,[No] < N^^B. 

Our second theorem states that if x is generated by a ^-process, then 
with high probability it is typical. 

Theorem 5. Let 5,e >0 and G (0,5/4) be given. There exists N2 
such that, for every N > N2 and every V -process z, one has 

These two theorems can be combined as follows. Let us postulate that 
we get to observe some realization of some F-process. Then, with high 
probability, we will be able to find a simple ^-process that typically yields 
the observed realization. 

This section is organized as follows. In Section 3.2 we prove Theorem 5, 
and we then turn to the proof of Theorem 4 in Section 3.3. 

3.2. Typical sequences: proof of Theorem 5. We here prove Theorem 5. 
The proof uses the following large deviation estimate for Bernoulli variables. 
Let {Xn)ne'!ss be an infinite sequence of i.i.d. Bernoulli r.v.s with parameter 



APPROXIMATING A SEQUENCE 



19 



p, and set = EiLi^*/"-' each n G N. By Alon, Spencer and Erdos 
[(2000), Corollary A. 14], 

P{\Xn-p\ > ep) < 2exp{-Cepn), 

where Cg = min{e^, —e + (1 + e) ln(l + e)} is independent of n and p. Hence, 
for A; G N, 

(26) p(sup\Xn-p\>£p) <2 exp(-c,pn) < ^^^^P(-^g^) . 

\pn>k / n=\k/p-] ^ ^^P'^ ^''P^ 

Observe that for every e sufficiently small, < < e^/2. 

Let 6,e e (0,1) and C G (0,5/4) be given. Choose ^' G (C,V4) and set 
e' = (i+g)max g5 | y*(s) | +e ' A''2 G N be large enough so that the following 

inequalities are satisfied for each N > N2: (Tl) 1 _^cxp(^ ~c 'n^ /-^ - 1 ) — ^/^^' ' 
(T2) iV^'"^>3|5|2^,g5|y*(s)|, and (T3) N^/^>1^. 

Let N > N2, and let z = (z„) n be a V^-process. We start by introducing a 
convenient decomposition for z. Recall that for s G 5, l^*(s) is the finite set 
of the extreme points of V{s). For each n > 0, the conditional distribution of 
Zn+i, given zq, . . . ,z„, belongs to y(z„), hence can be written as a convex 
combination J2ve:V*{zn)^'n-{v)v, where the weights b„(f) are random. We 
then divide the choice of z„+i into two steps. In the first step a point v„ G 
V*{zn) is drawn according to the weights b„. Next Zn+i is drawn according 
to v„. In other words, we simply view the given ^-process as a process 
z = {{zn,Vn))n, where v„ G V*{zn) and the conditional law of z„+i, given 
past values, is v„, for each n > 0. 

Following the notation used in Section 2, we let Nf = |{n < A^, (z„, v„, z„+i) = 
(s,u,t)}| denote the number of one-step transitions from s to t which oc- 
curred through the extreme point v. We also use a number of derived notions. 
For instance, N^^.^^ = J2v£V{s)'^s,v,t is the number of one-step transitions 
from s to t, Nf „ . = J2t&s v t is the number of visits to s that are followed 
by the choice of the extreme point = E?)ev*(s),tGS '^^s,v,t is the number 

of visits to s, and p^((s,f),t) = Nf^^ j/Nf . is the proportion of transitions 
to t, out of (s, v). It is defined only when Nf ^ . > 0. Note that the empirical 
transition function is given by 

Finally, we set u^(s,-) = — "^^*-|^z — It is the average point of V{s) 

s 

used at s. Since V{s) is convex, u^(s, •) G V^(s) for each s G S. We will show 
that with high Pz-probability, is close to in the sense of Definition 6. 

We now move to the core of the argument. The following lemma asserts 
that if the transition (s, v) occurs frequently, then with high probability. 
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the observed probability p^{{s,v),t) of moving from {s,v) to t is close to 
the true one, v{t). 



Lemma 6. Let s,t S and v G V*{s) be given. Then 

p^{{s,v),t) 



f Nf , . max{v{t),p'{{s, v),t)} > N^/^ 
(27) 

>l-47. 



v{t) 



Proof. Note first that Nf .t;(t) < N^/"^ if v{t) < N^l^-^. Assume now 

that v{t)>N^/^-^. Let n<N be a sequence of i.i.d. Bernoulli r.v.'s with 
parameter v{t). By (26) and (Tl), 

(28) ¥,{^%^.v{t)>N'l^ and \p^{{s,v),t) - v{t)\> e'v{t)) < 
Moreover, one has 

(29) P^(Nl.,^.v{t) < N^l^ and l^%^.p^{{s,v),t) > N^'"^) < 2/N^/\ 

Indeed, let (Xi) be a sequence of i.i.d. Bernoulli r.v.s with parameter v{t), 
and set n= lN^/^/v{t)\. By Markov's inequality, the left-hand side in (29) 
is at most 

max {Xi + • • • + Xk} > N^^A < Pz(Xi + • • • + X„ > N^/"^) 

k:kv{t)<N^/^ J 

<nv{t)/N^/'^ <2/N^'^. 

To conclude, equation (27) follows from (28), (29) and the choice of 
Let T* be the set of all realizations y = (xq, wq, a^i, fi, ■ • ■ iXn) for which 
the implication in (27) holds, for every s,t £ S and every v G V*{s). By 

Lemma 6 and (T2), Pz(T*) > 1 - ^'^'"^^f/ ^ ^ " Wc - To conclude 
the proof of Theorem 5, it therefore suffices to show that every sequence in 
T* is (iV, (5, e) -typical. 

Let y be a sequence in T* . Following earlier use, we denote by the 
value of at y, and we use a similar convention for other random variables. 
We shall verify that y eT{N, 6, e). 

Let s, i G S" be such that Nypy{s, t) > N^. We will prove that \l-py{s, t) /uy{i 
e. The argument is also valid if Nyuy{s,t) > . It is enough to prove that 

(30) Nl,^.\py{{s,v),t)-v{s,t)\<j^Nypy{s,t) for every veV*is). 

Indeed, by summing (30) over all v G V*{s), it follows that 

A^^ f' 

\pyis,t)-uyis,t)\< -iw'\p'ii^^^)^t)-^it)\<-r^p'M\y*is)\, 

vGV*{s) 
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which imphes, by the choice of e', that |p^(s,t) — < eu'^{s,t), as 

desired. 

We let V £ V*{s) be given and proceed to the proof of (30). If Ng^y^. max{pJ^((s, v),t),v{t)} > 
iV'/^then, since the imphcation in (27) holds for y, one has \v{t) —p^{{s,v),t) \ < 
e'v{t). Multiplying both sides by Ng^y^., we get 

Nyy^.\py{{s,v),t)-v{t)\<e'Nl,^.v{t) 

where the last inequality holds since Nypy{s,t) = J2vev*is) ^s,v,-pyi{s,v),t), 
and (30) holds. If, on the other hand, Nly^.max{py{{s,v),t),v{t)} kN^/"^, 
then, since Nypy{s,t) > , 

Nyy^.\py{is,v),t) - v{t)\ < N'l^ < Nypy{s,t)/N'i\ 

and (30) holds by (T3). □ 

3.3. Proof of Theorem 4. We first provide a heuristic overview of the 
proof. It will be helpful to contrast it with the proof given in Section 2. 
In the basic setup, the given sequence x, or equivalently, the given array 
{^s,t)s,teS of one-step transitions, was first extended to a periodic and ex- 
haustive sequence. Next, the structure theorem was used to find a certain 
partition into atoms. The approximating process simply visited each atom 
in turn for a number of stages equal to the observed one. The transition 
function of the process was such that each atom was a recurrent set. It 
was obtained by watching the observed transition function on the different 
atoms. Moving from one atom to another was done in a single step. These 
last features allowed for a simple analysis. 

At a broad level, the analysis of the general problem is similar. We again 
start by extending the given sequence x to a periodic and exhaustive se- 
quence X* , and by applying the structure theorem to obtain a partition 
Si, . . . , Sk of S (see Section 3.3.2). As in Section 2, the approximating pro- 
cess will focus on each atom in turn. 

However, here we are constrained to use ^-processes, hence, the former 
choice of a transition function may not be feasible. Instead, we introduce 
the transition function v gV that is closest to p^ (we omit details in this 
sketch). The approximating process will essentially evolve according to v. To 
be more precise, consider a specific phase k. Since need not be recurrent 
for V, the process may occasionally exit from Sk- We will then let it evolve 
according to b, so as to re-enter Sk in a few stages (recall that b is a 
fixed irreducible transition function). In a first approximation, the behavior 
of the approximating process during phase k can thus be described by the 
transition function that coincides with v on Sk, and with 6 on Sk- 
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It turns out that it is convenient to amend this definition as fohows. Once 
the process exits from Sk, a (fictitious) entry state t in Sk is drawn, according 
to the distribution of the entry state under a specific Markov chain (again, 
we omit details). The process wih evolve according to b until t is reached. It 
then switches back to v. Note that this no longer describes a Markov chain, 
since the transition function on Sk may be either b or v, depending upon 
the circumstances. This feature is best dealt with by adding a component in 
the state space, which keeps track of the current status of the process. This 
component takes values in S' = S L) {o}, where o is an additional symbol. 
The fcth piece of the approximating process is defined as the S-marginal of a 
Markov chain over S x S' , whose transition function nk is defined as follows. 
Whenever the (S"-component is set to o, the S-component evolves according 
to u. As long as the S'-component remains in Sk, the S"-component remains 
equal to o. When the 5-component exits Sk, then an element t of Sk is 
selected with a given probability, and the S"-component of the Markov chain 
is set to t. This t is the target of the S'-component, which evolves according 
to b as long as t is not reached. Once t is reached, the ^'-component is set 
to o. For the purpose of the transition from phase A; — 1 to phase k, the exact 
definition of the transition function will be slightly different. 

The new aspects raise additional difficulties. 

First, note that the set Nq that appears in the statement (G2) roughly 
coincides with the set of stages in which the process moves according to 
b. In order to prove that the cardinality of this set is small compared to 
N, one needs to prove that the expected time to reach Sk under b is small 
compared to the expected time to leave Sk under v. The expected time to 
leave Sk under can be derived from the sequence x* . We will thus have 
to compare the expected exit times from Sk, computed under v and . To 
do that, we will use a result on the comparison of exit times from a given 
set under close Markov chains. 

Second, in order to prove (Gl), we need to compare the empirical fre- 
quency for which the S-component is s & Sk with the frequency of s along 
X. To this end, we prove, as in Section 2, that the transition function that is 
defined by v on Sk and by b on Sk is mixing. We then use a result relative 
to close Markov chains in order to compare the invariant distribution of the 
latter transition function to the invariant distribution of p^ . 

We now describe the organization of the proof. We define the approxi- 
mating process z in Sections 3.3.2 and 3.3.3. We then state in Section 3.3.4 
two propositions that readily imply Theorem 4. These two propositions are 
statements about the transition function vr^,. The subsequent sections are 
devoted to the proofs of these propositions. Sections 3.3.5 and 3.3.6 contain 
the statement and the application to our framework of results on perturbed 
Markov chains. These results are used in the last three sections, which con- 
clude the proof. 
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3.3.1. Fixing parameters. Let tp,ri (0, 1) be given. We here list a num- 
ber of conditions on £,5 and A^i under which the conclusion of Theorem 4 
holds. We stress that we do not strive for optimal conditions. 

Fix < e < 77/56L < r/, with L = ^^^I^ (If l)nl^l . 

Choose (3 G (0, ^(4)'^' x where A =1/2. Set a = 20\k\L^ and 

a' = . Note that (3 < 1/20\S\^L'^, so that a' > 2. Choose V' G (0, V), 

(0,^/^7(151 + 1)), (5' G (0,^/2). Finally, choose -5 G (0, min{(5', (1 - V')/2}). 
Set a = M. 

Choose iVi G N sufficiently large such that (C1)-(C8) and (Al)-(A7) hold, 
for every iV>iVi: (CI) N^' >N^ + l, (C2) N^>j^, (C3) 2 + 8L\S\{N + 

\S\f X {1 + \S\'^/N^)<N'^/\S\, (C4) N^~^ > 1/7?, (C5) N^~^~^' >8BL\S\, 
(C6) > 42(5 + l)\S\/£^, (C7) e7Vi+«-^ > 2{N + |5|), (C8) iV^ > 

|5|2(1 + 55eL)/e, (Al) p{N^ - 1) > (iV + (A2) iV? - 1 > (A3) 

L{N^ + ir\ < Nf, (A4) ^(^ + i? + l) < ff^f , (A5) 5(1 + 3.) (^^1^5 < 

2^<e, (A6) iV^/|S| > 1 + 2(1 + 3e)Af^(M + 1)1^1, (A7) iV« > 18|5|. 

We will prove that the conclusion of Theorem 4 holds for every N > Ni 
and every typical S- valued sequence x of length + 1. 

3.3.2. The periodicized sequence. Let x be a (A^, 5, e)-typical sequence. 
Let X* = {xq, . . . ,x*j^^) be the periodic and exhaustive sequence that is ob- 
tained by extending x as we did in Section 2.4.3. 

Since x is typical, we can choose once and for all, for every s G S", an 
element v{s, •) G V{s) such that, for every t & S, 

v{s,t) 
p^(s,t) 

As the next lemma asserts, since x is typical, so is x* . 



(31) N^max{p''{s,t),v{s,t)}>N^ 



1 



<e. 



Lemma 7. x* is {N ^, 5' ,2>e) -typical. 
Proof. By (CI), 

A-f max{p^*(s,t),w(s,t)}> ATf =^ A^f max{p^(s, t), w(s, t)} > A^^ 

In that case, by (C2), |1 - ^| < | and |1 - ^| < |. This implies that 

l-*^ ~ ^^^ist) I ~ 1-*^ ~ ^N^^/N^ I — ^' Together with (31), we deduce that 



(32) Nfm^x{p-'{s,t),v{s,t)}>Nf =^ 1-^4^ 
Thus, the extended sequence x* is (A^*, 5', 3e)-typical. □ 



<3e. 
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The following lemma asserts that for every state s that is frequently vis- 
ited, the observed transition out of s, p^{s,-) and the transitions out of s 
under the ^/-Markov chain v, are close. 

Lemma 8. Let s e S be given. IfNf > N^, then \\v{s, •) —p^{s, ■)t\\oo < rj. 

Proof. Let t G S be given. If max{?;(s, i),p^'(s, t)} < ?y, then clearly 
\v{s, t) -p'^is, t)\<7]. Otherwise, by (C4), max{i;(s, t^^p'^is, t)} > -qN^ > 
Therefore, by (31) and the choice of e, \v{s,t) — p^{s,t)\ < ep^{s,t) < rj. 

□ 

3.3.3. The approximating process. We here construct the approximating 
hidden Markov chain z. Its properties will be established in later sections. 

We apply Theorem 2 to the sequence x* and a = N^, and obtain a 
partition C = Sa, . . . , Sr) of S. Let Ko = {k: iVf * > N^~^} be the fre- 
quently visited atoms. For convenience, we assume that Kq consists of the 
first l-ftTol atoms of the partition C, so that Kq = {!,..., l-ftTol}- We assume 
also that S\Xo\ the most frequently visited atom, so that, in particular, 
Ni:^>N/\S\. 

For k € Kq, we define a transition function vr^ over Q = S x (S U {o}) as 
follows: 

1. At state (s,o), s G Sk- A state s' G 5" is first drawn according to v{s, •). 
If s' G Sk, the chain moves to (s',o); if s' ^ Sk, a state t G Sk is drawn 
according to P ^, {Ts,, = Tt) and the chain moves to {s',t). 

2. At state {s,t), s ^t and t G S^- A state s' G S" is first drawn according to 
b{s, •). If s' = t, the chain moves to (s', o). Otherwise, the chain moves to 
{s',t). 

3. At state {s,t), s ^ Sk and t G iS/c U {o}. A pair {s',t') G is drawn with 
probability b{s, s') x P^^px* {Ts,. = Tf). If s' = t' , the chain moves to (s', o). 
Otherwise, the chain moves to {s',t'). 

Other states are visited with probability 0. Note that the S'-marginal of 
7rfc((s,t),-) is either v{s,-) or b{s,-) and, in particular, belongs to V{s). 

Plainly, a chain with transition function vTfc always moves in a single step 
to a state in (S x Sk) U (Sk x {o}). In particular, the third item in the 
definition of vTfc may possibly be relevant only at the initial stage. Note that 
the S-coordinate behaves under vr^. as described in the overview. Starting 
from Sk X {o}, it moves according to v, unless it exits Sk- In that case, a 
target state in Sk is drawn according to the distribution of the entry state 
computed with p^ . Then the S-coordinate moves according to b until it 
reaches the target state. At this point, the target flag is removed, and the 
S-coordinate resumes moving according to v. 
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The approximating hidden Markov chain has \Kq\ pieces. The length of 
piece k is Ng^ , except that of piece |i^o|: its length is N — J2k<\KQ\ ' which 
is between N^*^ ^ and Ng*^ ^ + I^IA^"^"^. In piece k the process follows the 

transition function Tr^. We denote by rrifc the length of piece k. 

Formally, we let w be a piecewise homogeneous Markov chain over 
whose transition function coincides with vr^ at stages J2j<k "mj <n< J2j<k 



We define z to be the first component of w, so that it is a piecewise hidden 
Markov chain. Thus, for every stage n in piece k, the conditional law of 
w„_|_i is vrfc(w„, •). The initial state of w is irrelevant. We will prove that the 
process z satisfies both (Gl) and (G2). 

For the convenience of the proof, the definition of the boundaries of the 
feth piece slightly differs from the one in Section 2.4.2. 

3.3.4. Two propositions. We here state two propositions relative to vTfc, 
without proof. We next show why Theorem 4 follows from these propositions. 
As a consequence, the proof of Theorem 4 reduces to statements about 
Markov chains. 

In Proposition 2 below, Vm^[s,o) is the empirical frequency of visits 
to the state (s,o) in stages through — 1. Recall that {s\Sk) = 
u-\s)/u-'{Sk). 



Proposition 2. Let k G Kq he given. For every cj € and every s £ S^, 
P. 



In effect. Proposition 2 contains two statements. By summation over s G 
Sk, it implies that, with high probability, the empirical frequency of 5^ x {o} 
is close to one. It also says that the empirical frequency of (s, o) is close to 
the observed frequency of s along the sequence x* , when conditioned on Sk ■ 

In Proposition 3 below, J^ g^xjo} ~ ^ ^ ("^^^ ^ {°})}l is the 

number of stages in which the (S-coordinate of the state evolves according 
to b rather than according to v. Recall that B = maxg ^tes'^s,b[Tt] is a bound 
on the expected time under b to reach any given state. 

Proposition 3. Let k e Kq and to efl be given. One has 

In effect, Proposition 3 asserts that the total number of stages the process 
spends outside Sk x {o} is small. As a consequence, the empirical frequency 
of Sk X {o} is close to one. This statement, however, differs from Proposition 
2, since here the result is phrased in expected terms. 
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We now show why the conclusions of Theorem 4 foUow from Propositions 
2 and 3. 

We begin with (G2) and let mk_i <n< rrik. For every s € S^, one has, by 
Theorem 2(P2), A^^^ > > iV«. Hence, by Lemma 8, \v{s, ■) -p'^is, ■)\<r]. 
By definition of vr^, the 5-marginal of p^(w„, •) is equal to f (z„, •) whenever 
w„ G Sk x{o}. In particular, w„ G x{°} implies n ^ Nq. Denote by 
No,A: the number of visits to states outside Sk x{o} during the kth piece. 



By construction, Ez[No,fc] < sup, 
3, 



[N- 



(33) 



Ez[No] < 



E 

k£Ko 



EzNo,fc 



SkX{o} 
<BN^ 



, so that by Proposition 



and (G2) follows. 

We next check that (Gl) holds. Fix seS, such that i/^(s) > l/N^. By 
construction s € for some k € Kq. We introduce the frequency (s, o) = 
^[{"ifc <n < rrik+i :w„ = (s,o)}| of visits to (s,o) in piece k. Note that 
the difference Ni^f^(s) — mfci>^^(s,o) is the sum of two terms: (i) the total 
number of visits to s in phases other than k, and (ii) the total number of 
visits to {s} X S during phase k. As a consequence, Niyf^{s) — m/^D^^ {s, o) < 
J2keKo^o,k- By (33), Markov's inequality and (C6), one has 

(34) F 



N \ BN^+'l" 



< 



< 



Note that the conditional distribution of P^^(s,o), given wq, . . . i^mui coin- 
cides with the distribution of ^'^^.(5,0) under a Markov chain starting from 
and with transition vTfc. Hence, by Proposition 2, 



(35) 



1 



>>°) 



-{s\Sk) 



> 55eL < 



1 



'As, 



By (34) and (35), the probability that both inequalities Nv'f^{s) — rriki'^^ 



eN^-^ and 11 - 
(C8), 

Vn{s) - 



{s\Sk) 



< 55eL hold is at least 1 



1 



< 



is,' 



m 



+ 



m 

1 



is, 



\s\Sk)\ 



<eN-^ + j^\S\^N- 



N: 



Nf^i>Z,{s,o) + -^x55eLu^\s\Sk) 



N 



< 
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N+\S 




\S\'^N~\l + 55eL)i/^*(s|5fc) + 55eLi^^* (s) 



<56eLi/^*(s). 
By the choice of e, this proves (Gl). 

3.3.5. Perturbation of Markov chains: reminder. We here introduce a 
resuh on perturbations of Markov chains due to Solan and Vieille (2003). 
This result provides an estimate of the sensitivity of the stationary distri- 
bution and other statistical quantities with respect to perturbations of the 
transition function. This result will be applied to our setup in the next 
section. 

Given C Q S with \C\ > 2, and an irreducible transition function over 
5 with invariant measure fi^, set 



This is a variation of the conductance of a Markov chain that was originally 
defined by Jerrum and Sinclair (1989) and was used in the study of the 
rate of convergence to the invariant measure [see, e.g., Lovasz and Kannan 
(1999) and Lovasz and Simonovits (1990)]. 

The notion of closeness we use is the following one: 

Definition 7. Let be an irreducible transition function on S with 
invariant measure /i^, let C C S* with \C\ > 2, and let /3,e > 0. A transition 
function is (d,e)-close to on C if (i) = p^(s,-) for every s ^C, 

a„d (ii) II - < ^ for every M € C such that W,) m J{p>(., t)y(l)} > 

This definition is not symmetric since it involves the invariant distribution 
of p^ , and not that of . It requires that the relative probabilities of moving 
from s to t under p^ and under p^ are close, but only for those one-step 
transitions s — > t that, on average, occur relatively frequently. 

The next result summarizes Theorems 4 and 6 in Solan and Vieille (2003). 
It asserts that if p^ and p'^ are two transition functions that are close in the 
sense of Definition 7, then their invariant measures are close, as well as other 
statistical quantities of interest, such as the average length of a visit to a 

set and the exit time from a given set. Recall that L = X][2=i^ ('n )'^''^' ^^^^ 
that Tc is the first hitting time of the set C. The quantity Kp{C) has been 
introduced in Section 2.3.2. 




mm 

0CDCC 



Y^f,\s)p\s,D). 



s€D 
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Proposition 4. Let e g (0, l/2l^l), A>Q, and p e (0, Ut)^^ 



) 

be given. Let be an irreducible transition function defined over S with 
invariant measure fi^ . Assume that \C\ > 2 and that Ps pi(T^^ < T^) > A 
for every s,t £C. Let p^ be {P,e)-close to p^ on C. Then: 

(a) All states of C belong to the same recurrent set for p'^ . Let fi'^ be the 
invariant measure of p^ on that recurrent set. 

(b) For every s C and every D cC, one has 

(37) L-i<^£i!!M<L 



1 K„2(D) 
and L-^ < J )' < L. 



(c) Let X £ (0)/3Cpi] be any number such that, for every s,tGC, 



(38) fi\s)max{p\s,t),p\s,t)}>x 

Then either 
(39) 



p\s,t) 



p\s,t) 



L-^K„i{C)<K„2{C) < LK„i{C), or 



(40) K„.{C)> 



1 



2151 



X 



and Kp2 (C) > 



1 



1 



L 2\S\ 



X 



qk 



3.3.6. Perturbation of Markov chains: application. We here introduce 
the auxihary transition function on S defined by 

V ^ on S_k, 

In Lemmas 9 and 10 below, we first check that the conditions of Proposition 4 
are fulfined by qu and p^ , as soon as is not a singleton. Next, relying 
on Proposition 4, we provide estimates of the mixing measures \q^{Sk) and 
Pqki^k) (see Proposition 5 below). These estimates will later be used to 
relate the properties of q^ to those of the transition function vr/j over $7. 

Lemma 9. If \Sk\ > 2, the transition function qk is {P,3e)-close to p^ 
on Sk. 



Proof. By Definition 7 and (32), it suffices to prove that f3C^^, > -j^. 



Nf 



For each C C Sk, one has by (P2) and since a = 
(41) Vz/^*(g)p^*(s,C) = ^'^ > 



sec 
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follows by (Al). □ ^ 



By taking the minimum over C dSk, this yields O > The result 



Recall that we set A = 1/2. 

Lemma 10. // I^a,.] > 2, one has 'Ps,p'^* C^t < > A for every s,t G 
Sk- 

Proof. Suppose first that s 7^ t , so that P^ ^,* (T^ <T-^J = p^^^,* [Tt < 
Tg^). By Lemma 2, 

^\^'h^..{Sk)-{\Sk\-l)p,.^{Sky 
By Lemma 5, p^s* {Sk) < {Sk), so that by (A7), 

(42) P,.,.. (T, < Tj^) > 1 - 2|S,|^-^^|jXi^ >\>A. 

Suppose now that t = s. Since \Sk\ > 2, p^* {s,Sk) < < ^ = 77^1 so that 
by (42) and (A7), 

teSk\{s} 

111 

By Lemmas 9 and 10, we can apply Proposition 4 to and qk with 
^ = 1/2. Recall that a = 1/(2/? |S| L^). 

Proposition 5. Assume that \Sk\ > 2. T/ien Xq^^{Sk) > apq^{Sk) and 

1 ^s* 
\kiSk) > 2L\s\^- 

Proof. We first provide a lower bound on K^^* {Sk)- By (16), one has 
f43) K^^*{Sk) ^ 1 ^ > 



(^fc) E.es, {s, Sk) N-^^^ - 

We now prove the first assertion. For C C Sk, by (43) and (P2), 

N^u^ {Sk) ^ iV* ^ ^ i^j^ 1^ '^'^ ('S'fc) 
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C and using (A2), this yields 



where the last inequality follows since > ^ . By optimizing over 



(44) K^^<S,)>-l-x''^'^^'^ 



Using (39) and (40) with x = PCp^ , (44) yields 



1 1 u-*iS,) 



(45) V (Sk) > K,^{S,) > - X X 

Fix Cc5fe. By (45) and (37), 

I i/'^*(C) 

2P'\S\L EseC^^*is)p-'is,C) 2(3\S\L 

The first assertion follows by taking the maximum over C. 

We now prove the second assertion. By (32) and since 5' Ktp' , (38) holds 

with X = ~^ ■ We distinguish two cases. If K^^:* (5^) > ^f^r ^ then 
by (40), 



11 A^f 1 1 Nf 



as desired. If, on the other hand, K^x*{Sk) < ^Tsi N x'v ' then by (39) and 
(43), 



L Rf^-L - L(iV€ + 1)1^1' 

which by (A3) gives the result. □ 

3.3.7. Proof of Proposition 3 when \Sk\ > 2. By Proposition 4, there 
is a recurrent set for that contains S^- Therefore, there is a recurrent 
set C for TTfc that contains Sk x {o}. We denote by fi-j^f, the invariant 
measure of vr^ on 0^. We first prove that ^7^^. assigns a significant weight to 
S'fc X {o}. 

Lemma 11. // \Sk\ > 2, then fi^^{Sk x {o}) > 1 - > \. 
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Proof. By Proposition 5 and (C5), 1 - > I- 



We now prove that /x^,(S'fcx{o}) > l - . Plainly, E(, t),,rfc[^Sfex{o}] = 

E<i^6[Tj] < B for every t S 5^ and every s E 5 \ {t}. 

On the other hand, by Lemma l(ii). Proposition 5 and by the choice of 

/?, 

mm E(,,„),., [T-^] = mm E,,,, [r;^J 
By (18), one gets 



f^n,{SkX{o}) ^ 2B 



^7rfc(5'fe X {o}) Xq^,{Sk)' 



hence Hn^Sk x {o}) < j^^. 

We now proceed to the proof of Proposition 3 when \Sk \ > 2. Observe first 
that 



min 'Erj tt, [Nt; — rr'l = min E^ -j^. [Nt; — ttI ■ 

Since fi-j^^ is the invariant measure of vr^ over 0^, one has E^^^^tt^. [N^^^-^^^^J = 
mk^i-w^i^k \ {Sk X {o}))- By Lemma 11 this yields 

(46) min E^,,, [N^-r^] < 25 x 

Let 7 = max^gs'^xjo} E<^,7rfc[N^^^^^^], and let wi G 5^ x {o} achieve the 
maximum. Since the S-marginal of i^k coincides with h outside Sk x {o}, 
one has, for uj G Sk x {o}, 

7 = E.„., [N^^] < E^,., [N^^] + P,^,,^(r;^-^ < T^){B + 7). 

By Lemma 2 and Proposition 5, Puii,nk{ Tg^^^^y < T^) < a^2~\s\ ~ • 
Since a' > 2, one gets, by letting lj vary, 

7 < mm E^ ^, Nt; — H 

' - a'- l..eSfcx{o} ""'^^^ 5fcx{o}J^^/_i 

(47) 

<2 min E^,,JN^-T^] + 5. 



Finally, for each G fi, by (47), (46), and Proposition 5, 

max 

i^e5fcx{o} 

^G5feX{o} 



E^',., [N^^] < E^.,,, [T^^ X {o}] + ^^max^^ E^,^, ^^^^ 
<B + B + 2 _min_ E^,^JN^^] 
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rrik 



<2B + 8BL\S\Nf 



rrik 

X 



^1. 



Since mk/N§l is either 1 (if A: < \Ko\) or at most 1 + \S\'^/N^ (if k = \Kq\), 
the desired resuh follows by (C3). □ 



3.3.8. Proof of Proposition 2 when \Sk\ > 2. To prove Proposition 2 when 
I'S'fcl > 2, we first prove that vr^ is mixing (see Lemma 13). We can then apply 
Theorem 3 as we did in the proof of Theorem 1. We are therefore able to 
compare the empirical frequency Um^. to /ijrj. . Since p^* and qk are close, this 
enables us to compare the invariant measure of tt^, Hn^i to the invariant 
measure of p^* , u^* . 



Lemma 12. // \Sk\ > 2, then, for every a; G 0^ and every s S 5^, one 
has 



+ {\Sk\-l)p,,{Sk) + 2B 



Proof. The proof is a simple adaptation of the proof of Lemma 3. We 
repeat it, with few modifications. Let lo and s E 5^ be given. Note that 

(48) E^,,, [r+ „ J < 1 + max E,, [r(,,„)] . 

Set 7 = maxtgs^. E(t [^(s,o)]- Let t G Sk\ {s} achieve the maximum in the 
definition of 7. By Lemma 1, 

7 = E(t,o),^, [T(,,„)] < Et,,, [Ts^^^.y] + Pt,g, (% < T,) (7 + S) 

<{\Sk\- l)pq,{Sk)+B + jx maxP„,„(% < T,). 

ueSk " 



Therefore 

(49) 7 < 



i\Sk\-l)p,,iSk) + B 



min„e5^P„,„(T^ <%) 
For uenkMSk X {o}), 

(50) E^,pjr(,,o)]<S + 7. 

The result follows from (48)-(50). □ 
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The lemma below is a mixing-type result. It is very similar to Lemma 1. 
Lemma 13. // \Sk \ > 2, then, for every a; G and every s G 5"^, 
E.,.,[T+„)]<2|5|L^^^ + 4i3 + l. 

Proof. We repeat the proof of Proposition 1 with minor adjustments. 
By Lemma 12, 



+ i\Sk\-l)pg,{S,)+2B 

^^,^.[^is,o)\ - ^^^^^^ p^_^(j.^ < ^_ ) + ^- 



By Lemma 2, the denominator is at least 1 — 2|5'fc|^; — (^Sk)-{' \ 's'k'\-i)p .(Sk) — ^' 
where the inequality follows by Proposition 5 and the choice of /?. Therefore 

E.,.jr+„)]<2|5fc|p,,(5fc)+4i3 + l. 
By (37), Lemma 5 and Theorem 2(P2), 

N^* N 
P,kiSk) < Lp^.* (5.) < L max < L-^. 

The result follows. □ 

Define p'^^{s) = //7rj.((s, o))//^^^ (S^ x {o}). It is the invariant measure of 
TTfc conditioned on x {°}- 

The following compares the empirical number of visits to (s, o) to the 
invariant measure. 

Proposition 6. // \Sk \ > 2, then 

P^.TT,- (|i^m,(s,o) - M°,(S)| > Y^^^lki') + ^ 

Proof. By Remarks 2 and 3, Lemma 13 and (A4), for every lo G il^, 



p 



(\^mk{s,o) - Pnk{is,o))\ >e^^,((s,o)) + — 



(51) <^(2\S\L-^ + AB + l]< ' 



e'^ruk V iV« - 1 / ~ 2N^ ' 

Since fj,T^f,{{s,o)) > 1/2, by Lemma 11, Proposition 5 and by the choice of 

/3, 

\f^n^i{s,o)) - fJ'l^{s) \ < 2p^,ink \ {Sk X {o})) X fln,{s,o) 
- \ /Q \ ^ /^vrfe(s,o) <e/i7rfe(s,o). 
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Therefore, if \iy^^{s,o) - /i^^((s,o))| < efi^^{{s,o)) + then \umk{s,o) - 
< + 7k^- The result follows by (51). □ 

We are now in a position to prove Proposition 2. Observe that the invari- 
ant measure of conditioned on Sk is simply {-{Sk)- By Lemmas 9 and 
10, and Proposition 4, 

- u-*is I Sk)\ < 18 X 3eLu-*{s\Sk). 
The claim follows by Proposition 6, the choice of e and since by (P2) and 
(C7) --*(.|5.)>#>f >^>^. 

3.3.9. The singleton case. We here assume that Sk = {s} is a singleton. 
The next lemma is an analog of Lemma 11. It bounds from below the in- 
variant distribution of vr/c on Sk x {o}. Its proof is, however, significantly 
different. 

Lemma 14. One has > 1 - ^(1 + 3e) ^^^t-l"" • 



Proof. By Theorem 2(P1) 



P"" is,S\{s})<-^< 



Using (32), this yields 

v{s,S\{s}) <{l + 3e) ^ ^IJ . 

We apply (18) to p = iTk, S = i^k and C = {(s, o)}, and we get 

l-;»^,((^,o)) ^ ^ , c\ r iWRn^-^ ^M±ll^ 
^.,((.,o)) ^ ^ \ ^ + ATi-^ • 

The desired result follows. □ 

The rest of the proof for the singleton case follows closely the proof for 
\Sk\ > 2. We first prove Proposition 2 in that case. By the definition of vr^, 
maXt^gQ^, E^;^^j.[T^'^^^] < B + 1. Therefore, using Remark 2 with Hk, e and 
uj = {s, o), and by (C6), 



(WmAs,°) - >e^7rfe((s,o)) + 



\1iB -\- 1) 

Pc^.TTfe ( \l^mk PtTTfe ((S, o)) I > e^^j, ((s, o)) -- 1 < ^ 

^ 1 

- 2\S\N^' 
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By Lemma 14 and (A5), l/^TrfeCl-s, o)) — 1| < e. The conclusion of Proposition 
2 follows. Observe that we also deduce that ^ 1/2- 

We now prove Proposition 3. Fix uj G Ofc- Since /^7rfc((s,o)) > 1/2, 



<B + 2E^^^,^JN^^] <B + 2mfc(l - ((s, o))). 



and Proposition 3 follows. 
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By Lemma 14 and (A6) this last quantity is at most 



B + 2NB{\ + 3£) ^ < BN^/\S\, 
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